task vector
- Europe > Switzerland > Zürich > Zürich (0.14)
- Asia > China > Shanghai > Shanghai (0.04)
- North America > United States > Colorado > El Paso County > Colorado Springs (0.04)
- (3 more...)
- Information Technology > Artificial Intelligence > Vision (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.93)
Task Arithmetic in the Tangent Space: Improved Editing of Pre-Trained Models Guillermo Ortiz-Jimenez
We present a comprehensive study of task arithmetic in vision-language models and show that weight disentanglement is the crucial factor that makes it effective. This property arises during pre-training and manifests when distinct directions in weight space govern separate, localized regions in function space associated with the tasks.
- Europe > Switzerland > Vaud > Lausanne (0.04)
- North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
A Theoretical Framework for LLM Fine-tuning Using Early Stopping for Non-random Initialization
Sun, Zexuan, Raskutti, Garvesh
In the era of large language models (LLMs), fine-tuning pretrained models has become ubiquitous. Yet the theoretical underpinning remains an open question. A central question is why only a few epochs of fine-tuning are typically sufficient to achieve strong performance on many different tasks. In this work, we approach this question by developing a statistical framework, combining rigorous early stopping theory with the attention-based Neural Tangent Kernel (NTK) for LLMs, offering new theoretical insights on fine-tuning practices. Specifically, we formally extend classical NTK theory [Jacot et al., 2018] to non-random (i.e., pretrained) initializations and provide a convergence guarantee for attention-based fine-tuning. One key insight provided by the theory is that the convergence rate with respect to sample size is closely linked to the eigenvalue decay rate of the empirical kernel matrix induced by the NTK. We also demonstrate how the framework can be used to explain task vectors for multiple tasks in LLMs. Finally, experiments with modern language models on real-world datasets provide empirical evidence supporting our theoretical insights.
- North America > United States > Wisconsin > Dane County > Madison (0.14)
- North America > United States > Maryland > Baltimore (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (3 more...)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- Asia > Malaysia (0.04)
- Asia > China > Heilongjiang Province > Harbin (0.04)
- Asia > China > Fujian Province > Xiamen (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (0.93)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- North America > United States > California > Santa Clara County > Palo Alto (0.04)
- (11 more...)
- Research Report > Experimental Study (0.92)
- Research Report > New Finding (0.67)
- North America > United States > Texas > Travis County > Austin (0.05)
- Europe > Poland > Podlaskie Province > Bialystok (0.04)
- North America > United States > Oregon > Multnomah County > Portland (0.04)
- (2 more...)
- Information Technology > Artificial Intelligence > Natural Language (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (0.93)
- Information Technology > Artificial Intelligence > Cognitive Science (0.93)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
- North America > United States > California > Alameda County > Berkeley (0.04)
- Europe > Romania > Sud - Muntenia Development Region > Giurgiu County > Giurgiu (0.04)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- North America > Canada > Ontario > Toronto (0.04)
- North America > United States > North Carolina (0.04)
- North America > Dominican Republic (0.04)
- (3 more...)
- Asia > China > Guangdong Province > Shenzhen (0.04)
- North America > United States > Maryland > Baltimore (0.04)
- North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
- (6 more...)
- Research Report > New Finding (1.00)
- Research Report > Experimental Study (1.00)
- Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
- Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
- Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.72)
- Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.67)